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Abstract 

Background: Proteins in tlieir majority act rarely as single entities. Multisubunit macromolecular complexes are the 
actors in most of the cellular processes. These nanomachines are hold together by weak protein-protein 
interactions and undergo functionally important conformational changes. TFIID is such a multiprotein complex 
acting in eukaryotic transcription initiation. This complex is first to be recruited to the promoter of the genes and 
triggers the formation of the transcription preinitiation complex involving RNA polymerase II which leads to gene 
transcription. The exact role of TFIID in this process is not yet understood. 

Methods: Last generation electron microscopes, improved data collection and new image analysis tools made it 
possible to obtain structural information of biological molecules at atomic resolution. Cryo-electron microscopy of 
vitrified samples visualizes proteins in a fully hydrated, close to native state. Molecular images are recorded at 
liquid nitrogen temperature in low electron dose conditions to reduce radiation damage. Digital image analysis of 
these noisy images aims at improving the signal-to-noise ratio, at separating distinct molecular views and at 
reconstructing a three-dimensional model of the biological particle. 

Results: Using these methods we showed the early events of an activated transcription initiation process. We 
explored the interaction of the TFIID coactivator with the yeast Rapl activator, the transcription factor TFIIA and 
the promoter DNA. We demonstrated that TFIID serves as an assembly platform for transient protein-protein 
interactions, which are essential for transcription initiation. 

Conclusions: Recent developments in electron microscopy have provided new insights into the structural 
organization and the dynamic reorganization of large macromolecular complexes. Examples of near-atomic 
resolutions exist but the molecular flexibility of macromolecular complexes remains the limiting factor in most case. 
Electron microscopy has the potential to provide both structural and dynamic information of biological assemblies 
in order to understand the molecular mechanisms of their functions. 



Background 

Genomic sequences are now available for many different 
organisms which, when combined with biocomputing 
analysis result in the annotation of most of the coding 
regions that define the protein repertoire of the living 
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creature. Systematic protein purification experiments 
revealed that proteins act rarely as single entities but are 
generally associated into well-defined complexes, 80% of 
which contain between 5 and 12 distinct proteins [1]. 
Interestingly, several proteins show some degree of infi- 
delity and can be found in distinct complexes. Moreover 
the documented complexes correspond only to the most 
stable molecular interactions that resist the harsh pro- 
tein purification conditions. Many more transient inter- 
actions are likely to occur between proteins and protein 
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complexes to build up the intricate and robust molecu- 
lar interaction network that governs cell fate. 

Macromolecular complexes are therefore at the center 
of most biological processes. They integrate spatially 
several catalytic or structural activities with built-in reg- 
ulatory functions. In most of the cases, conformational 
changes that range from atomic to molecular scale are 
instrumental to explain the function of these complexes. 
Altogether these dynamic properties, associated with the 
size of the particles ranging between 10 and 40 nm sub- 
stantiates the name of nanomachines often attributed to 
these complexes. These nanomachines are targeted by 
most of the currently available drugs used to cure 
human diseases but for their vast majority the drugs 
inhibit a catalytic activity carried by a single subunit. 
Only in rare occasions the intrinsic mechanical proper- 
ties or the specific protein-protein interaction network 
of a complex is targeted by drugs. The ribosome is one 
of such nanomachines, responsible for protein synthesis 
and for which several examples of drugs targeting the 
mechanical properties are at hand [2]. Macrolydes and 
other antibiotics affect the translocation of the ribosome 
along the mRNA and thus inhibit protein synthesis. 
Fusidic acid was shown to prevent the dynamic turnover 
of the elongation factor G and thus affects the interac- 
tion of the ribosome with this regulatory factor. Finally 
antibiotics such as Dalfopristin or Quinopristin were 
found to bind to the ribosome exit channel and to block 
mechanically the progression of the nascent polypeptide. 
Few other examples of drugs targeting so clearly the 
intrinsic mechanical properties of a complex were 
described so far. This is related to the poor structural 
information available to date on complexes since most 
of the atomic structures deposited in the protein data 
bank are single polypeptides. 

This tutorial aims at describing the molecular organi- 
zation of the general transcription factor TFIID as a 
paramount multi-protein complex and to emphasize the 
role of cryo-electron microscopy (cryo-EM) and digital 
image analysis to integrate structural and functional 
information in order to reach a mechanistic model of 
the complex. 

Methods 

Cryo-EM of frozen hydrated molecular complexes 

Imaging of single particles by electron microscopy and 
numerical analysis of image datasets have proven invalu- 
able tools to describe the structural organization of large 
macromolecular assemblies. Since the discovery of nega- 
tive stain by Brenner and Horne in 1959, single particles 
embedded in a layer of heavy atom salts can be visua- 
lized through the high contrast provided by the elec- 
tron-dense material that surrounds the biological 
macromolecule composed of low atomic number 



elements, which poorly scatter electrons (Figure 1) [3]. 
Despite its ability to provide high contrast, to reveal fine 
structural details and to sustain fragile structures, the 
negative staining approach is limited to the description 
of surface features and rarely extends beyond 15-20 A 
resolution. 

A major breakthrough was achieved by the discovery 
in the early 1980' of a robust specimen preparation 
method that preserved specimen hydration in the 
vacuum of the electron microscope [4,5]. The method 
relies on the fast vitrification of a thin aqueous layer 
containing the specimen by plunging into a liquid 
ethane slush (Figure 1). This procedure prevents ice 
crystal formation that segregates particles and ruins 
image quality. The frozen hydrated sample has to be 
observed at low temperature, typically close to liquid 
nitrogen temperature, to prevent phase transitions and 
special cold stages were developed for cryo-EM observa- 
tions. This groundbreaking technology opened new hor- 
izons for the observation of macromolecular complexes. 
It allows unconstrained particle conformations in the 
absence of any crystal contacts and in close to physiolo- 
gical ionic strength and pH conditions. In contrast to 
crystallized conditions, in which a particular conforma- 
tion is selected, a flexible particle will be able to adopt 
all permitted conformations. Conformational flexibility 
may be detrimental for structure determination since 
fine structural details may be averaged out, but cryo-EM 




Figure 1 Preparation of purified molecular complexes for 

electron microscopy. In negative stain the specimen is adsorbed 

on a carbon film, embedded in a layer of heavy metal salts and 

dried. In frozen hydrated conditions the molecules are embedded 

in a thin layer of vitrified buffer suspended in a hole of the carbon 

film. Corresponding electron micrographs are shown (right panels). 

The bar represents 50 nm. 
^ 
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records conformational intermediates and thus holds the 
promise to detect and describe particle dynamics. Early 
electron diffraction experiments showed that in such 
frozen hydrated conditions, the structure of the speci- 
men is preserved down to atomic resolutions, thus 
showing for the first time that electron microscopy 
images have the potential to reveal the same structural 
information than X-ray diffraction [6]. 

The resolving power of modern electron microscopes 
is sufficient to image single atoms. In material sciences, 
the specimen is very stable and a huge number of elec- 
trons can interact with the sample often without affect- 
ing its structure. As a result, individual atoms can be 
detected with a good statistical significance or signal to 
noise ratio (SNR). Imaging of biological samples fully 
benefit from the most recent developments in instru- 
mentation such as field emission guns which give a 
much brighter and more coherent electron beam, detec- 
tors and microscope automation. Specific instrumenta- 
tion is needed to observe frozen hydrated samples 
which includes cold stages to keep the specimen tem- 
perature below -170°C, anti-contamination devices to 
prevent deposition of traces of water present in the 
microscope onto the cold specimen as well as low-dose 
imaging protocols to avoid irradiation of the specimen 
before data acquisition. Structural damage induced by 
the electron beam is a strongly limiting factor for biolo- 
gical specimen. Inelastic interactions of incident elec- 
trons with sample atoms dissipate energy that can break 
covalent bonds and generate highly reactive side chains. 
It is generally accepted that the atomic structure of the 
specimen is preserved when electron doses are kept 
below 5 electrons per square angstrom (e7A^), however 
this number varies with the acceleration voltage of the 
electrons - at 300 kV it can be up to 25 e'/k^ [7]. In 
these conditions the molecular images are so noisy that 
the fine structural details cannot be detected. As a rule 
of thumb, at an electron dose of 5 eVA^, details in the 
ranee of 50 A can be detected with a SNR of two while 
smaller details are below this detection limit. To recon- 
cile low specimen irradiation which leads to noisy 
images, with a high SNR objective to detect small 
details, it is necessary to split the dose required to detect 
atoms (say 2000 eVA^) over several independent parti- 
cles (in this case 400) to kept the dose below 5 e7A^ 
and to add-up the signal coming from all these images. 

The ongoing development of highly sensitive direct 
detection cameras and single electron counting devices 
are important to record highly enlarged images of biolo- 
gical complexes with the best quantum detection effi- 
ciency and with reduced noise [8]. Automation of 
cameras and electron microscopes facilitates the record- 
ing of several hundreds to thousands of images per day 
each containing 100-200 particles thus generating huge 



image datasets which, as we will describe, will be of 
importance to reach the final spatial resolution [9]. The 
need for reduced electron irradiation also led to dedi- 
cated "low dose" data acquisition strategies in which 
microscope adjustments such as focus and astigmatism 
corrections are performed on an area remote of the area 
of interest to be recorded "blindly". 

Single particle image analysis 

The objectives of single particle image analysis are dual 
[10]. The first goal is to improve the SNR of the original 
images by averaging the signal from independent parti- 
cles. Assuming a Gaussian distribution of all sources of 
noise that can affect the original molecular image 
(statistics of electron-matter interaction, detector noise, 
cosmic rays, etc.), image averaging will improve the SNR 
and increase the spatial resolution that can be detected. 
The second goal of image analysis is to reach a volu- 
metric description of the sample. Standard transmission 
electron microscopes provide 2-D projections of the 3-D 
electron density map of the sample, multiplied by a 
microscope-specific Contrast Transfer Function, which 
has to be corrected for. The objective is to determine the 
projection (or viewing) direction of each 2-D image with 
respect to the 3-D object it originates from and to recon- 
struct a 3-D model by combining many 2-D views. A 
brief overview of the image analysis protocol is shown in 
Figure 2. 

Alignment and clustering 

Images of a same particle can be averaged to improve 
the SNR only if two criteria are met: firstly they have to 
correspond to the same view or projection of the parti- 
cle and secondly the images have to be in the same reg- 
ister, or in other words aligned in translation and in 
rotation one with respect to the others. The spatial reso- 
lution that can be reached will depend on the number 
of images that can be averaged, on how similar the 
views are, and on the alignment quality. If a tolerance of 
10° in viewing direction is accepted, the finest dimen- 
sion that can be resolved for a globular particle with 10 
nm in diameter cannot be smaller than 5.sinl0° = 0.8 
nm. To reach a resolution of 0.2 nm the variation in 
viewing direction cannot be larger than 2.2°. 

A molecular image is aligned against a reference 
image by correlating image intensities. The correlation 
coefficient between two images is a measure of their 
similarities and all possible translations and rotations 
will be explored to find the correlation maximum, 
which will be considered as their best alignment. The 
quality of the alignment depends on many parameters 
such as the initial SNR of the image, the size and the 
shape of the particle. 

In a real image data set, the particles have different 
orientations that lead to different views that need to be 
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Figure 2 Schematic representation of thie l<ey steps in tiie analysis of single particle images. 



separated before calculating an average image. The 
images need therefore to be clustered into groups con- 
taining the most similar images. The image intensity 
variance should be minimized within the same group, 
while it should be maximized between different groups. 
In practical terms the image data set is first subjected to 
a multivariate statistical analysis (Principal Component 
Analysis or Correspondence Analysis) to detect the 
most meaningful trends in the data set and the cluster- 
ing is then performed on the most significant Eigenvec- 
tors using Hierarchical Ascendant Classification 
schemes. 

In an ideal image data set, the particles are randomly 
oriented which will produce an infinite number of pro- 
jections. This condition is not always met when particles 
are adsorbed on a supporting carbon film, which may 
lead to preferred orientations. Nevertheless, the number 
of different orientations is very large and it is virtually 
impossible to find two perfectly identical particle 
images. It is therefore important to consider an angular 
projection sector within which we consider the images 
to be identical at a defined spatial resolution. An image 
class can to a first approximation be considered as a 



group of molecular images viewed along the same angu- 
lar sector. If we consider the above mentioned 10 nm 
globular particles, a 10° tolerance in projection angle 
will result in an uncertainty of 0.8 nm. A projection sec- 
tor of 10° leads to 244 different views and the dataset 
should be separated in as many classes. 

The alignment and clustering steps are highly interde- 
pendent and will be used iteratively to improve the 
quality of the class averages. A better alignment will 
lead to an improved clustering which will impact the 
resolution of the class averages. Such high resolution 
class averages will further improve the alignment of the 
original images in a multi-reference alignment protocol. 
Three-dimensional model building 

The class averages correspond to distinct views of the 
particle but their projection direction is not known a 
priori. A common-line based method was designed to 
attribute the relative projection directions of a set of 
class averages [11], but this method may lead to ambig- 
uous results especially when several conformations of 
the particle coexist. Two experimental methods, based 
on the acquisition of tilted images of the same particle 
have been developed. 
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In the tomography approach a goniometric electron 
microscopy stage is used to record tilted views of the 
same object, typically between +70° and -70° with angu- 
lar increments of 1 or 2° [12]. After alignment of the 
images on a common origin, a 3-D model can be calcu- 
lated for each particle by combining all views for which 
the exact projection direction is experimentally deter- 
mined by the position of the tilt axis and the tilt angle. 
This method suffers from several drawbacks that have 
been partially addressed. Electron dose and therefore 
radiolytic damage accumulates during the sequential 
acquisition of around 140 images of the same particle, 
but the development of very sensitive low noise cameras 
restricts the total accumulated dose to 20 to 40 e'/A^, 
enough to reach a resolution of 3-4 nm. The data col- 
lection scheme produces a wedge of missing projections 
and this problem can be overcome by turning the grid 
90° in plane and by recording a second tilt series. The 
missing wedge will then be reduced to a missing 
pyramid and the quality of the reconstructed volume is 
generally improved. Alternatively independently recon- 
structed single particle volumes can be aligned in 3-D 
and averaged. Since for each orientation of the particle 
the missing information is different, the averaged 
volume is essentially devoid of missing wedge artifacts. 

In the random conical tilt method, the data collection 
strategy consists in recording first a 45-60° tilted view of 
an electron microscopy field containing several particles 
and, in a second exposure, an untilted view of the same 
field [13]. The untilted images are analyzed as single parti- 
cles thus producing classes containing several images of 
similarly oriented particles each differing by their in-plane 
or azimuthal angle. This angle, the position of the tilt axis 
and the tilt angle, informs about the viewing direction of 
each corresponding tilted image and allows calculating 
a 3-D model for each class of images. With this data 
collection strategy, irradiation is limited to a single expo- 
sure and the missing information is restricted to a cone. 
Model refinement 

The experimental 3-D models are considered as low reso- 
lution "starting models" that will be used to determine 
the viewing direction of independently determined class 
averages obtained from a much larger image dataset. The 
starting models will be computationally "reprojected" 
along many directions to generate a set of reference 
images of known projection direction. The subsequent 
alignment of the class averages, or of the original images, 
against these reprojections in a process called reference- 
matching will determine the viewing direction for each 
high resolution class average and lead to an improved, or 
refined, 3-D model. 

Address the dynamic properties of ttie complexes 

The fast vitrification of the specimen in liquid ethane 
preserves the hydrated state of the protein complexes. 



but also cryo-immobilizes their different conformational 
states. This heterogeneity can hinder high-resolution 
structure refinement if different conformations are com- 
bined in a single class; however it contains essential 
information about the dynamic properties of the sample. 
For isolated particles, different conformations can be 
sorted out computationally when the data set is large 
enough, thus providing information on mobile parts of 
the complex. For transient multi-component systems, 
the relative abundance of the components present in an 
equilibrium state informs about the interaction con- 
stants. It is therefore crucial to detect and separate the 
conformational states of the specimen both to improve 
the resolution of each individual state and to describe 
the dynamics of the examined protein complex. Several 
methods exist to detect and visualize de novo structural 
heterogeneities in the specimen [14]. Rough movements 
of domain can be detected by either single-particle 
tomography or random conical-tilt experiments. More 
subtle differences can be tracked by using Eigen-analysis 
of resampled cryo-EM images [15,16]. In this method 
the images dataset has to be aligned to an average refer- 
ence structure to determine the relative particle orien- 
tation. A large number of volumes is built from a 
randomly created subset of the dataset and these 
volumes are subjected to multivariate statistical analysis 
followed by hierarchal classification to identify the struc- 
tural differences. 

Results 

The general transcription factor TFIID 

Gene expression programs in metazoans are under tight 
control to achieve growth, development and differentia- 
tion of the tissues that make up living organism. A large 
extend of regulation is performed at the transcriptional 
level when the information carried by specific DNA 
sequence (the genes) is transcribed into a messenger 
RNA molecule (mRNA) by the RNA polymerase II 
enzyme. Misregulated gene expression underlies many 
human pathologies, as indicated by germ-line and 
somatic mutations in transcription regulatory genes that 
lead to genetic disease [17-20], developmental syn- 
dromes[21], neurological diseases [22], epigenetic per- 
turbations [23] and cancer [21,24]. Most intensively 
studied is the initiation step, which determines which 
genes are turned on to express a specific piece of 
genetic information in response to external signaling 
events. Initiation of transcription is controlled by a large 
number of multiprotein complexes whose action results 
in the assembly of a transcription Preinitiation Complex 
(PIC) on the promoter DNA upstream of the coding 
sequence. The ultimate goal of the PIC is to position 
the RNA polymerase II at the transcription start site 
and to initiate the synthesis of mRNA [25,26]. 
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The general transcription factor TFIID is a key player 
in the initiation process since it is the first factor to 
interact with promoter DNA and directs the following 
steps that result in the onset of transcription [27]. This 
1 MDa TFIID multiprotein complex contains a protein 
recognizing the TATA-box in the gene promoter (TBP) 
and 13 TBP Associated Factors (TAFs) whose sizes vary 
between 10 and 250 kDa. To modulate gene expression 
in response to external signals, small activator or repres- 
sor proteins bind upstream of gene promoters and 
recruit the transcriptional coactivators and the general 
transcriptional machinery. In this process human TFIID 
not only recognizes the promoter DNA region of genes 
but also acts as a transcriptional coactivator by interact- 
ing with several such activators like p53, Spl and c-Jun 
[28]. TFIID thus acts as a bridge between transcriptional 
activator proteins and the PIC. 
Structure of TFIID, a hybrid approach 
How is gene transcription initiated and what is the role 
of activators and co-activators in this process? How is 
the activation signal transmitted from the activator to 
the general transcription factors and finally to the RNA 
polymerase? How do cells integrate and respond to reg- 
ulatory signals? What sets different gene expression 
levels in specific cell types? Which errors in this process 
lead to disease? To answer these questions the functions 
of TFIID have to be explained in mechanistic details by 
determining its biophysical and structural properties. 

A large body of structural data is available at atomic 
scale for single subunit TAF domains and small TFIID 
sub-assemblies such as the TATA box binding protein 
[29], the histone-fold containing TAF heterodimers 
[30-32], the N-terminus of TAF5 [33], The HEAT 
repeats of TAF6 [34] or the TAFl bromodomain [35]. 
Single crystal X-ray diffraction or NMR spectroscopy 
have determined the atomic structures of parts of TFIID 
that sum up to about 40% of its total mass but little is 
known about the organization of these bricks into a 
functional TFIID assembly that is active in transcription. 
The full complex is reluctant to crystallization since this 
large multisubunit complex is difficult to produce in 
large quantities and with purity suitable for crystal 
growth. This observation is general to most fields of 
biology and led to the development of so-called hybrid 
methods that integrate structural information from dif- 
ferent sources. Cryo-electron microscopy is instrumental 
to this integration since it provides medium (10-20 A) 
resolution maps of large complexes into which atomic- 
scale information obtained by X-ray crystallography or 
NMR spectroscopy can be fitted [36]. 

Low-resolution studies by negative-stain [37,38] and, 
more recently cryo-EM [39] have revealed the general 
shape of TFIID and allowed approximate localization of 



several subunits by means of antibody labeling [40,41]. 
Samples used in these studies were prepared from endo- 
genous sources and resulted in spatial resolutions that 
were seriously hampered by the dynamic properties, het- 
erogeneous nature and the low abundance of material. 
The lack of recombinant TFIID complexes of suitable 
quality and quantity for molecular level studies has been 
an insurmountable bottleneck to date for structural but 
also for functional studies. Recent developments in 
recombinant protein production were instrumental for 
solving the structure of the core-TFIID complex at 11 A 
resolution most probably because several sources of het- 
erogeneity arising from TAF isoforms and posttransla- 
tional modifications were reduced [42]. 

To gain insights into the function of TFIID, the inter- 
action of yeast TFIID with the promoter DNA was stu- 
died in the presence of TFIIA (a general transcription 
factor required for specific recognition of the TATA ele- 
ment) and the Rapl activator [43]. The Cryo-EM results 
revealed the network of interactions and the conforma- 
tional changes occurring during complex formation. The 
path of DNA was detected in the complex and these 
findings extended our understanding on the DNA recog- 
nition modalities by TFIID in the presence of trans-acting 
factors. The resulting structure has shed new light on the 
intramolecular communication pathways conveying tran- 
scription activation signals through the TFIID coactivator 
(Figure 3). This study revealed an interaction between 
TFIIA and Rapl that form a protein bridge between TBP 
and the DNA-bound Rapl which results in a large 
change in the position of TFIIA and of TBP. Interest- 
ingly, the concomitant binding of promoter DNA to 
TFIID-bound Rapl and to TBP loops out the intervening 
DNA, thereby accommodating variable distances between 
Rapl binding sites and transcription start site. 

Conclusions 

The development of cryo-EM and image analysis 
software has provided new insights into the structural 
organization and the dynamic reorganization of large 
macromolecular complexes. Recent improvements in 
electron microscopy instrumentation allow for auto- 
mated processing and recording of large image datasets, 
with an improved image quality due to more stable cold 
stages and advanced electron optics. With these devel- 
opments unprecedented close to atomic resolutions 
were obtained for highly symmetric biological assemblies 
such as icosahedral viruses. It can be anticipated that 
the analysis of large datasets as well as new data acquisi- 
tion strategies that compensate for particle movement 
during acquisition, will routinely provide molecular 
models better than 5A in the near future. The analysis 
of molecular flexibility still requires algorithmic 
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Figure 3 Structure of TFIID. (A) Structure of TFIID and of its complexes with TFIIA, DNA and the Rapl activator. (B) Structure of the core-TFIID 
at 1 1 .4 A resolution and fitting of the known atomic structures. (C) Position of the core-TFIID within the complete TFIID complex. The bar 
represents 7.6 nm in (A), 5 nm in (B) and 5.8 nm in (C) 



developments to describe concomitantly the high resolu- 
tion structure and the continuous conformational space 
of a macromolecular complex. The unique asset of 
Cryo-EM however resides in the possibility to record 
images of single particles which collectively contain both 
structural and dynamic information. 
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